Search CORE

23 research outputs found

Geo-located Twitter as the proxy for global mobility patterns

Author: Beinat Euro
Hawelka Bartosz
Kazakopoulos Pavlos
Ratti Carlo
Sitko Izabela
Sobolevsky Stanislav
Publication venue: 'Informa UK Limited'
Publication date: 01/10/2013
Field of study

In the advent of a pervasive presence of location sharing services researchers gained an unprecedented access to the direct records of human activity in space and time. This paper analyses geo-located Twitter messages in order to uncover global patterns of human mobility. Based on a dataset of almost a billion tweets recorded in 2012 we estimate volumes of international travelers in respect to their country of residence. We examine mobility profiles of different nations looking at the characteristics such as mobility rate, radius of gyration, diversity of destinations and a balance of the inflows and outflows. The temporal patterns disclose the universal seasons of increased international mobility and the peculiar national nature of overseen travels. Our analysis of the community structure of the Twitter mobility network, obtained with the iterative network partitioning, reveals spatially cohesive regions that follow the regional division of the world. Finally, we validate our result with the global tourism statistics and mobility models provided by other authors, and argue that Twitter is a viable source to understand and quantify global mobility patterns.Comment: 17 pages, 13 figure

arXiv.org e-Print Archive

DSpace@MIT

PubMed Central

Scaling of city attractiveness for foreign visitors through big data of human economical and social media activity

Author: Arias Juan Murillo
Belyi Alexander
Bojic Iva
Hawelka Bartosz
Ratti Carlo
Sitko Izabela
Sobolevsky Stanislav
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/04/2015
Field of study

Scientific studies investigating laws and regularities of human behavior are nowadays increasingly relying on the wealth of widely available digital information produced by human social activity. In this paper we leverage big data created by three different aspects of human activity (i.e., bank card transactions, geotagged photographs and tweets) in Spain for quantifying city attractiveness for the foreign visitors. An important finding of this papers is a strong superlinear scaling of city attractiveness with its population size. The observed scaling exponent stays nearly the same for different ways of defining cities and for different data sources, emphasizing the robustness of our finding. Temporal variation of the scaling exponent is also considered in order to reveal seasonal patterns in the attractivenessComment: 8 pages, 3 figures, 1 tabl

arXiv.org e-Print Archive

DSpace@MIT

Crossref

Mining Urban Performance: Scale-Independent Classification of Cities Based on Individual Economic Transactions

Author: Arias Juan Murillo
Combes Remi Tachet des
Grauwin Sebastian
Hawelka Bartosz
Ratti Carlo
Sitko Izabela
Sobolevsky Stanislav
Publication venue
Publication date: 01/05/2014
Field of study

Intensive development of urban systems creates a number of challenges for urban planners and policy makers in order to maintain sustainable growth. Running efficient urban policies requires meaningful urban metrics, which could quantify important urban characteristics including various aspects of an actual human behavior. Since a city size is known to have a major, yet often nonlinear, impact on the human activity, it also becomes important to develop scale-free metrics that capture qualitative city properties, beyond the effects of scale. Recent availability of extensive datasets created by human activity involving digital technologies creates new opportunities in this area. In this paper we propose a novel approach of city scoring and classification based on quantitative scale-free metrics related to economic activity of city residents, as well as domestic and foreign visitors. It is demonstrated on the example of Spain, but the proposed methodology is of a general character. We employ a new source of large-scale ubiquitous data, which consists of anonymized countrywide records of bank card transactions collected by one of the largest Spanish banks. Different aspects of the classification reveal important properties of Spanish cities, which significantly complement the pattern that might be discovered with the official socioeconomic statistics.Comment: 10 pages, 7 figures, to be published in the proceedings of ASE BigDataScience 2014 conferenc

arXiv.org e-Print Archive

DSpace@MIT

Collective Prediction of Individual Mobility Traces for Users with Short Data History

Author: Bartosz Hawelka (2156686)
Euro Beinat (3704821)
Izabela Sitko (712806)
Pavlos Kazakopoulos (3704824)
Publication venue
Publication date: 30/01/2017
Field of study

<div>We present and test a sequential learning algorithm for the prediction of human mobility that leverages large datasets of sequences to improve prediction accuracy, in particular for users with a short and non-repetitive data history such as tourists in a foreign country. The algorithm compensates for the difficulty of predicting the next location when there is limited evidence of past behavior by leveraging the availability of sequences of other users in the same system that provide redundant records of typical behavioral patterns. We test the method on a dataset of 10 million roaming mobile phone users in a European country. The average prediction accuracy is significantly higher than that of individual sequence prediction algorithms, primarily constant order Markov models derived from the user’s own data, that have been shown to achieve high accuracy in previous studies of human mobility. The proposed algorithm is generally applicable to improve any sequential prediction when there is a sufficiently rich and diverse dataset of sequences.</div

Directory of Open Access Journals

PubMed Central

FigShare

Correct/incorrect prediction for given position for three selected sequences.

Author: Bartosz Hawelka (2156686)
Euro Beinat (3704821)
Izabela Sitko (712806)
Pavlos Kazakopoulos (3704824)
Publication venue
Publication date
Field of study

Prediction accuracy in our setup depends crucially on the availability of good experts in the ensemble. In the three example sequences we see a color-coded depiction of prediction success or failure adjacent to the numbers of awake and best experts, i.e. experts that can provide a prediction at a given step, and those among them which have accumulated the minimum loss up to that step. The three sequences are rather typical examples seen in the test dataset. Low numbers of best and awake experts almost invariably lead to incorrect predictions, and vice versa.</p

FigShare

Comparison with the best expert in the ensemble.

Author: Bartosz Hawelka (2156686)
Euro Beinat (3704821)
Izabela Sitko (712806)
Pavlos Kazakopoulos (3704824)
Publication venue
Publication date
Field of study

The best expert here is declared at the end of the sequence, as the Markov model in the expert ensemble which accumulated the minimum loss during prediction. If more than one experts share this property, a representative is chosen arbitrarily. (A) The EW forecaster’s prediction accuracy compared to the best expert prediction accuracy. The forecaster’s accuracy is superior more often than not, and with larger differences, resulting in a 4% average advantage. (B) The O(1) Markov model constructed sequentially from the user’s own locations as they are recorded in real time is less accurate than the best expert for a large majority of the test sequences. It may appear slightly surprising that another users data is better at predicting a given user’s location sequence, but the user’s own Markov model is constructed sequentially, needing time to learn the patterns, while experts’ Markov models enter the “competition” fully constructed.</p

FigShare

Prediction per position and over a hour of a day.

Author: Bartosz Hawelka (2156686)
Euro Beinat (3704821)
Izabela Sitko (712806)
Pavlos Kazakopoulos (3704824)
Publication venue
Publication date
Field of study

(A) Average prediction accuracy per position n in the sequence, for the EW forecaster and Markov models orders k = 1, 2, 3. The best Markov model is O(1) and is on par with the EW forecaster for the first half-day after the start of the user’s sequence and the prediction process. EW achieves a stable (average) lead after that point. The quasi-periodic pattern is due to the fact that most roamers arrive to the visit country during the day, combined with the fluctuation between day and night prediction accuracies seen in (B). Prediction accuracy is significantly higher in the period between 02:00–08:00 because of the much higher regularity of mobility patterns during these hours.</p

FigShare

EW forecaster prediction accuracy.

Author: Bartosz Hawelka (2156686)
Euro Beinat (3704821)
Izabela Sitko (712806)
Pavlos Kazakopoulos (3704824)
Publication venue
Publication date
Field of study

(A) Percentage of sequences predicted with a certain accuracy (in bins of 10%) for the EW forecaster and Markov models of order k = 1, 2, 3 constructed sequentially from the users own data as the sequence of locations is observed in time. We use a learning rate η = 3. The EW forecaster improves on the performance of the best Markov model, which again turns out to be O(1) [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0170907#pone.0170907.ref027" target="_blank">27</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0170907#pone.0170907.ref032" target="_blank">32</a>], by an average of 5%. A detailed comparison between the two is depicted in (B), the scatterplot of difference in prediction accuracy per sequence. For more than 90% of the test sequences, the EW forecaster is more accurate.</p

FigShare

Prediction accuracy dependence on sampling and Tpast.

Author: Bartosz Hawelka (2156686)
Euro Beinat (3704821)
Izabela Sitko (712806)
Pavlos Kazakopoulos (3704824)
Publication venue
Publication date
Field of study

(A) Average prediction accuracy for particular filterings of the expert ensemble. We randomly sample experts from the ensemble and additionally we filter the experts’ sequence fragments so that only those that end within a time window Tpast are included. Decreasing the sampling rate and/or reducing Tpast decimates the ensemble, and beyond a point it hits the accuracy of the forecaster. (B) The average percentage of distinct transitions Xn−1 → Xn in a test sequence that are contained by at least one expert in the ensemble after filtering. Prediction accuracy in (A) starts dropping when the sampling rate is reduced beyond a few percent, showing that the ensemble is very diverse and robust. A very slight drop in performance comes with including all experts, due to the logarithmic search costs of the forecaster when the ensemble grows.</p

FigShare